Regret optimality in semi-Markov decision processes with an absorbing set

نویسندگان

  • Yoshinobu Kadota
  • Masami Kurano
  • Masami Yasuda
چکیده

The optimization problem of general utility case is considered for countable state semi-Markov decision processes. The regret-utility function is introduced as a function of two variables, one is a target value and the other is a present value. We consider the expectation of the regret-utility function incured until the reaching time to a given absorbing set. In order to characterize the regret optimal policy, we derive the optimality equation and then prove the uniqueness of solution. As application, two examples of regret-utility functions are used to illustrate the analysis for these models. Keywards: Regret optimal policy, Semi-Markov decision processes, General regret-utility, Optimality equation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards

This paper establishes the existence of a solution to the optimality equations in undiscounted semi-Markov decision models with countable state space, under conditions generalizing the hitherto obtained results. In particular, we merely require the existence of a finite set of states in which every pair of states can reach each other via some stationary policy, instead of the traditional and re...

متن کامل

Semi-markov Decision Processes

Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.

متن کامل

Exploration-Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first a...

متن کامل

On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes

In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectiv...

متن کامل

Optimal Threshold Probability and Policy Iteration in Semi-markov Decision Processes

We consider undiscounted semi-Markov decision process with a target set and our main concern is a problem minimizing threshold probability. We formulate the problem as an infinite horizon case with a recurrent class. We show that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also several value iteration methods and a polic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004